AITopics | model and task

Collaborating Authors

model and task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

Nikankin, Yaniv, Arad, Dana, Itzhak, Itay, Reusch, Anja, Simhi, Adi, Kesten-Pomeranz, Gal, Belinkov, Yonatan

arXiv.org Artificial IntelligenceOct-31-2025

One of the main challenges in mechanistic interpretability is circuit discovery, determining which parts of a model perform a given task. We build on the Mechanistic Interpretability Benchmark (MIB) and propose three key improvements to circuit discovery. First, we use bootstrapping to identify edges with consistent attribution scores. Second, we introduce a simple ratio-based selection strategy to prioritize strong positive-scoring edges, balancing performance and faithfulness. Third, we replace the standard greedy selection with an integer linear programming formulation. Our methods yield more faithful circuits and outperform prior approaches across multiple MIB tasks and models. Our code is available at: https://github.com/technion-cs-nlp/MIB-Shared-Task.

artificial intelligence, machine learning, qwen-2, (17 more...)

arXiv.org Artificial Intelligence

2510.25786

Country: Oceania > New Zealand (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Collaborative Performance Prediction for Large Language Models

Zhang, Qiyuan, Lyu, Fuyuan, Liu, Xue, Ma, Chen

arXiv.org Artificial IntelligenceJul-1-2024

Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between model families and only consider design factors listed in the original scaling law. To overcome these limitations, we introduce a novel framework, Collaborative Performance Prediction (CPP), which significantly enhances prediction accuracy by leveraging the historical performance of various models on downstream tasks and other design factors for both model and task. We also collect a collaborative data sourced from online platforms containing both historical performance and additional design factors. With the support of the collaborative data, CPP not only surpasses traditional scaling laws in predicting the performance of scaled LLMs but also facilitates a detailed analysis of factor importance, an area previously overlooked.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.013

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models

Variengien, Alexandre, Winsor, Eric

arXiv.org Artificial IntelligenceDec-13-2023

When solving challenging problems, language models (LMs) are able to identify relevant information from long and complicated contexts. To study how LMs solve retrieval tasks in diverse situations, we introduce ORION, a collection of structured retrieval tasks spanning six domains, from text understanding to coding. Each task in ORION can be represented abstractly by a request (e.g. a question) that retrieves an attribute (e.g. the character name) from a context (e.g. a story). We apply causal analysis on 18 open-source language models with sizes ranging from 125 million to 70 billion parameters. We find that LMs internally decompose retrieval tasks in a modular way: middle layers at the last token position process the request, while late layers retrieve the correct entity from the context. After causally enforcing this decomposition, models are still able to solve the original task, preserving 70% of the original correct token probability in 98 of the 106 studied model-task pairs. We connect our macroscopic decomposition with a microscopic description by performing a fine-grained case study of a question-answering task on Pythia-2.8b. Building on our high-level understanding, we demonstrate a proof of concept application for scalable internal oversight of LMs to mitigate prompt-injection while requiring human supervision on only a single input. Our solution improves accuracy drastically (from 15.5% to 97.5% on Pythia-12b). This work presents evidence of a universal emergent modular processing of tasks across varied domains and models and is a pioneering effort in applying interpretability for scalable internal oversight of LMs.

dataset, mechanism, residual stream, (16 more...)

arXiv.org Artificial Intelligence

2312.10091

Country:

South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

Ahuja, Sanchit, Aggarwal, Divyanshu, Gumma, Varun, Watts, Ishaan, Sathe, Ashutosh, Ochieng, Millicent, Hada, Rishav, Jain, Prachi, Axmed, Maxamed, Bali, Kalika, Sitaram, Sunayana

arXiv.org Artificial IntelligenceNov-13-2023

Recently, there has been a rapid advancement in research on Large Language Models (LLMs), resulting in significant progress in several Natural Language Processing (NLP) tasks. Consequently, there has been a surge in LLM evaluation research to comprehend the models' capabilities and limitations. However, much of this research has been confined to the English language, leaving LLM building and evaluation for non-English languages relatively unexplored. There has been an introduction of several new LLMs, necessitating their evaluation on non-English languages. This study aims to expand our MEGA benchmarking suite by including six new datasets to form the MEGAVERSE benchmark. The benchmark comprises 22 datasets covering 81 languages, including low-resource African languages. We evaluate several state-of-the-art LLMs like GPT-3.5-Turbo, GPT4, PaLM2, and Llama2 on the MEGAVERSE datasets. Additionally, we include two multimodal datasets in the benchmark and assess the performance of the LLaVa-v1.5 model. Our experiments suggest that GPT4 and PaLM2 outperform the Llama models on various tasks, notably on low-resource languages, with GPT4 outperforming PaLM2 on more datasets than vice versa. However, issues such as data contamination must be addressed to obtain an accurate assessment of LLM performance on non-English languages.

language model, megaverse, model and task, (1 more...)

arXiv.org Artificial Intelligence

2311.07463

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Best NLP Research of 2019

#artificialintelligenceFeb-23-2020, 19:53:51 GMT

Natural language processing (NLP) is one of the most important technologies to arise in recent years. Specifically, 2019 has been a big year for NLP with the introduction of the revolutionary BERT language representation model. There are a large variety of underlying tasks and machine learning models powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. Convolutional Neural Network (CNNs) are typically associated with computer vision, but more recently CNNs have been applied to problems in NLP.

explanation, neural network, nlp, (13 more...)

#artificialintelligence

Genre: Research Report (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback